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ABSTRACT 

The effects of the protocol filn series. Concepts and 
Patterns in Teacher-Pupil Interaction, irere measured by a fili-based 
test after the test had undergone validation* The test^ Categorizing 
Teacher Behavior, evidenced a high level of reliability and validity 
in a graduate level course in educational psychology* The test iras 
judged valid by a jury of experts and shoved a significant increase 
in mean test performance from a pretest situation to a posttest 
situation* Further, there vas a significant decrease in variances 
from pretest to posttest« There was evidence that the use of the 
Concept and Pattern films in an instructional setting (a course in 
educational psychology) had a significant effect on concept 
acquisition as measured by the film^based test* The evidence suggests 
that there vas a gain on all of the concepts (probiugf inforuiJig, 
approving^ disapproving^ productive questioning^ and reproductive 
questioning) from a time period prior to instruction to a time period 
after instruction* (Author) 
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Measuring the Effects of a Protoco* Film Series: 
Instrument Development and Use 

Broadly speaklngt protocol materials In teacher education are designed 
to be used In the acquisition of concepts referring to behavioral events, 
particularly those occurring In classroom settings* The acquisition of 
such concepts t which may refer to behaviors In either the pedagogical or 
subject matter areas of Instruction, should In turn lead to a greater de- 
gree of Interpretive competence* In defining the need for materials to 
accomplish these ends, Smith (1969) hypothesized that such Interpretive 
coiii;>etence provided the conceptual background necessary £or the development 
of skills In teaching* 

At ^n operational level, concept acquisition Implies the development 
of sklllfulncss In discriminating and categorizing ongoing beKac^flors in 
terms of a more or less complex set of concepts or categories* Practically 
speaking, this means that the use of protocol materials designed to ^^teach" 
a prescribed set of concepts should result in a high degree of accuracy 
in selecting and ^^sortlng" the observed behaviors Into those categories* 
In short, the minimal perfonnance criterion for learning from protocol 
materials is sklllfulness in categorizing complex behaviors in terms of 
specified concepts* 

The evaluation task for the developers of any set of protocol materials 
then* becomes two-fold; first, to construct and validate an Instrument 
designed specifically to measure acquisition of those concepts upon which 
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th»it set of protocol materials Is based; second, to evaluate th^ effects 
of training based upon the use of that set of protocol materials by means 
of this Instrument* The present paper addresses these two evaluation tasks 
In that order* llie mo3t direct outcome of the evaluation studies r'^ported 
In this paper l$ empirical evidence on the effectiveness of a eat of pro- 
tocol materials In producing gains In acquisition of a specified set of 
concepts* The more general contributions of this evaluation study are 
(1*) the design of a practical and objective format for r.ssesslng concept 
acquisition by means of a film-based (and thus, ^^observation-based") test 
and (2*) the development of a general strategy for assessing '^mastery'' In 
such learning tasks as concept acquisition. IDbls strategy was. In fact, 
developed to resolve a major measurement problem posed In this evaluation 
study: the validation of a criterion- referenced Instrument (Brown and 
Pugh, 1975). 
Protocol Film Series 

llie protocol films that are the subject of this evaluation study are 
those In the Concep t s a nd Patterns In Teacher - Pupi l Interaction series 
developed through the Indiana University Protocol Materials Project* Hie 
purpose of the nine films In this series to to define, exemplify, and pro- 
vide practice In the Interpretive use of a specified set of concepts 
referring to teacher behavior In a classroom Interactive setting* Six 
categories of teacher behavior were selected from the empirical literature 
on classroom behavior and adapted to the specific Instructional purposes 
of this film series. Together with their definitions, they are: 
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8£PftODUCTIVE QUESTION a teacher question Intended to directly elicit the 
recall of content specifically learned as part of a course or topic 
oz study. In response to such a question, the student Is expected 
to accurately reproduce such content or to recognize When It Is ac* 
curately reproduced by someoiie else. Typical student responses are 
repetition, restatement or recognition of content. 

PRODUCTIVE QUESTIOf} a teacher question that is Intended to encourage the 
production of idess or coinblnatlons of Ideas ts opposed to simply the 
reproduction o£ specifically learned content. A student response to 
such a question msy reflect the recall of specifically learned content 
but that content Is used In such processes as Interpretation, appll*- 
cation, and evalu&tlon.^ 

PR(ffiIllG * a teacher reaction In the form of a question or Implied question 
that pursues some aspect of the substantive content of t preceding 
student response. Such ptobes typically seek further description, 
clsrlf Icatlon, explanation, or extension of that substtnClve content. 
By **pi^c^<lii^S response^* is meant any preceding response including, 
but not limited to, the student response that lorEedlately precedes a 
teacher reaction. By "substantive content** Is meant the formal con*" 
tent of classroom discussion as opposed to such procedural content 
as assignment making, the order of discussion, or disciplinary matters. 

INFOBMXKG - a teacher reaction In xjhtch Information Is Introduced that in 
related to sqsob aspect of the substantive content of a preceding stu* 
dent response. Such a reaction Is often Intended to produce some 
modification in the substantive content of that student response. 
By ''preceding response*^ Is meant any preceding response Including, 
but not limited to, the student response that teedlatoly precedes 
a teacher reaction. By **3ubstantlve content*^ Is meant the formal con* 
tent of classroom discussion as opposed to such procedural content 
as assignment making, the order of discussion, or disciplinary matters. 

APPROVING > a verbal and/or nonverbal teacher reaction that is intended 
to encourage, or might reasonably be expected to encourage continued 
student naspondlng or a continuation of student behavior. 

DISAPFROVXKG « verbal and/or nonverbal teacher reaction that is Intended 
to discourage, or might reasonably be expected to discourage continued 
student respondlitg or a contlTniatlon of student behavior. 

As lndlcat<£d above, the specific behaviors referred to by these con- 
cepts commonly described and referenced In the literature on teacher 
behavior (see, for example, Dunkln and Blddle, 1974). In £tct, it is In 
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^ part this commonality of use ^nd the existence of 0cip.e related empirical 
literature that led to the selection of these concoplT.. For purposes of 
clear definition and portrayal, these concepts were arranged In the fol** 
lowing pall's; approving and disapproving, reproductive questioning and 
productive questioning, probing and Informing. Minimally, use of the 
Concepts and Patt erns series In Instruction should result In skillfulncss 
In using these concepts as Interpretive categories. It Is also entirely 
plausible to hypothesize that effective use of the series might alao ro- 
suit In acquisition of the skills referenced by these concepts. Altlif^ush, 
that hypothesis was not tested in the studies x'eported In this pr^c'^'-* acnsc 
aspects of It are curre.itly being Investigated by the authors. 

The films In the series are 16mm. motion pictures In color v^lth 
synchronous sound. Each film Is quite brief, ranging from five to elevan 
minutes In length. The series Is arranged In two subsets; three Concept 
films, each o£ which Introduces, defines and eKempllfies a pair of concopta; 
six Pattern films, each portraying classroom behavior that can be lnt?r* 
preted In terms of varying combinations of these concepts. The scries Is 
designed to be used In conventional lnstructor*led discussion or prssGnta- 
tlon. The Individual films may be used In a variety of sequences although 
the most common Is probably that of a Concept £llm followed by one or more 
relevant Pattern films. 

Test Validation 
T es t Development and Characteristics 

The development of the filmed test, entitled Categoriginfi Te2ch*^r 
Behavior ^ was central to the evaluation plan for the entire film series. 



l!he **loglc" of developing an entirely fllm-baijecl protocol aerlea led quite 
resaonably to the specification timt the principal evaluation Instrument 
alao be film-based. It was felt that the task of categorissing behavior 
portrayed on fllip would most closely approKimate the interpretation of 
behaHor in actual classrooms. 

The test conslsta of 30 brief classroom episodes each less than one 
minute in length, llieae episodes were selected from "out-takes" from the 
films in the series (that ia, from footage not Included in the Co nc ei^t 
and Pattern films themselves). As a consequence, the classrooir. scttT^jif^, 
subject matter context, teachera and pupils are the aame as in th^ ^on c^pt 
and Pattern films. However, the apeciflc episodes appearing in the test 
film do not appear in the other films of the series. 

Certain facts about the uae of film as a medium in test development 
should be noted. Although all episodea for the teat were dravn from scones 
recor<?ed on film and although the test is distributed aa a film, the HXr^tzd 
footage was transferred to videotape for purposes of teat development and 
preliminary trial. Consequently, in terma of n^imbera of aubjects, most of 
the data obtained during validation was based on a videotaped represen^a** 
tion of the test, the uae of videotaped preliminary versions for the 
gathering of data is easily explained. As with printed tests, film^bened 
tests must undergo a number of revisions, and as a medium, film is bo>.u 
expensive and difficult to revise. Videotape transfers, on the other hand, 
are both economical and convenient to modify. Die authors^ expectation 
was that since the changes introduced by thia shift in medium wa-G prliaarily 
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technical (for exantple^ absence or presence of color and relative sharpness 
of icrage)^ the mgnltude and pattern of quantitative results would not be 
substantially different for the two media forms* In fact» a comparison 
of the results obtained for the videotape and final film forms of the test 
confirms this expectation* 

The test Is divided Into three parts» providing for all possible com* 
blnatlons of the pairs of concepts* Part I requires that the teacher 
behavior in each episode be categorized in terms of problng» lnform5.o<^» 
approving and disapproving* In Part II, the concepts Involved am pc(.*iuctive 
questioning, reproductive questioning, approving and dlsapprovXn;;* In 
Part III, the concepts are probing. Informing, productive questioning and 
reproductive questioning* I^n episodes are Included In each part; within 
certain practical limits imposed by editing possibilities, these eplsod?,s 
are representative of the content of the Pattern films themselves* Separate 
test items for each of four concepts are presented for each episode* In 
the case of each item, a "yes" or "no" option is provided for the question 
of whether or not a specified concept is Illustrated in tho eplsot*^* As 
a consequence, a total of 120 items (20 items for each concept) arc^ cor** 
talned in the test* 

During developioent , the test was subjected to scrutiny by sIk inembers 
of a panel each of whom took the test independently and then daXit>£raca*l 
together until unanimous agreement was reached on the correct responses to 
each item* Only items for x^hich unanimous agreement was reached were re* 
talned* From the set of items and episodes emerging from this re^incmont, 
episodes were selected for each of the major parts of the test* 
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Each episode Is presented twice with a delay of froca two to four 
seconds between presentations; the purpose of this repeated presentation 
Is to minlmi&e dependence on recall of the specific episode aa a factor 
In the measuremeut of concept acquisition. Following the second presenta^ 
tlon of each episode, th^ eKdminee la given a fifteen second period in 
which to respond by recording his answer on a separate answer she&t. l!he 
total time for the test (Including repeated episodes, delays between epl^ 
sodea, and delays for responding) la approximately 35 minutes. 

The restriction of Item for each episode to four rather than siK 
concepts was planned to achieve a balance between "stimulus slmpUcltjir** 
and the experience of "stimulus overload." The resulting "response task'* 
was thought to be at an optimum level of difficulty. Overall, the examinee 
la required to Indicate the presence or absence of each concept 20 times. 
The limitation of each form to a total of 120 Items provides for a reason- 
ably short test administration time as well as a reasonable opportunity to 
demonstrate concept acquisition. 

In Table 1 is presented the frequency of occurrence of each concept 
for each part of the test. Ihere arc 20 episodes within the total test 
in which each concept might have occurred and 10 episodes within each t-art 
of the test In which each concept might havs± occurred. By comparing .-^'^ttj^l 
frequency of occurrence to the number of episodes (In each of which ^ ^Ivcn 
concept mlfiht occur) one can obtain the ratio of Instance to non^ir^stanc^ 
Items for each part and concept. 



Insert Table 1 about here 
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In terms of a potential aource o£ response bias, an Ideal ratio of 
Instance to non*lnstance Items might have been 50:50* However, the method 
of Item selection precluded obtaining an Ideal ratio* In the case of this 
test, the representativeness of the Items and episodes was considered to 
be very Important even though the ratios of Instance to non-Instance cor- 
rect answers for selected parts of the test deviated from the Ideal. A 
**y6s** or a **no*' response on a given Item has about the same probability 
of being correct for "approving" (12 yes to 8 no correct answers), "pro- 
ductive questioning" (10 yes to 10 no correct answers), and "dlsapprwing** 
(9 yes to 11 no correct answers). For "probing,^' the ratio Is 8 yes to 
12 no; for "Informing" It le 7 yes to 13 no; for "reproductive questioning'* 
It Is 6 yes to 14 no. 

In Table 2 Is presented the number of concepts Instanced In each epl- 
sodet From the Table, It can be aeen tltat the typical number of concepts 
Instanced for each episode Is one or two. All episodes contain Instances 
of one, two or three concepts. 



Insert Table 2 about here 

Definition oj Stratefev to Assess Test Characteristics 

As a test, CateRorlzln g Teacher Behavior Is criterion- referenced be- 
cause the content was derived from the meaning of the six underlying 
concepts (i^roblng. Informing, approving, disapproving, reproductive 
questioning and productive questioning) and a mastery performance level 
was sought. Since there are special prbblans In the appropriateness 
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of some of the conventional test characteristic Indices » a strategy 
developed by Brown and Fugh (1975) was adopted to portray the character* 
Istlcs of the test. 

The strategy devised to assess the test characteristics assumed that 
the objective of the fllm^'based Instructional treatment yas to bring th£^ 
learner from a pre>treatment state below ttie criterion performance^ level 
to criterion level. It also assumed that If Categorizing Teacher Behavior 
detects this change of learning state consistently over the six cmicepts^ 
over varlatlona In the Instructional setting^ over separate groups o.^ 
learners » and over balf^^tests within the test» then one has evidence both 
for (1.) the reliability and validity of the te&t and (2.) the reliability 
and validity of the Instructional treatment. 

Ihe steps taken to establish the content validity of the test were 
(1.) developing a substantial number of episodes exemplifying the concep::s» 
(2.) arranging for Independent Judging of the Items by a Jury of staff 
members who bad acquired the concepts* (3.) balancing the number of Items 
for each concept » and (4.) Including a sttfflclent number of It^s so that 
each balf*test Is meaningful. 

Pre *and post ' treatment states . In the development o£ the test, it 
was most reasonable to assume a pre-treatment state reflecting seme aitouni: 
of prior learning. Since the test was developed to be used with bfith 
preservlce and Inservice teachers^ It was plausible to eonolude that the 
six concepts might be partially acquired prior to treatment. For the pr<i- 
treatment state» then» the mean of the test was expected to be above die 
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50 percent (or chance) level of difficulty; It va& also expected that the 
variance would be greater than at the po&t*^treatment state* Die latter 
state yas, consequently, expected to be characterized by a higher mean ^nd 
attenuated variance * 

Predicted relationships between treatment states* Since the test was 
to be administered to subjects for Whom some prior learning nmy have 
occurred, certain relationships between treatment states (l^e*, pre^creat* 
ment and post *^ treatment states) were predicted* Repeated mes^urements 
across the two treatment states should show an Increase In means and a 
decrease In variance* These directional predictions should hold for test 
composite scores and half-test composite scores* All of these composite 
comparisons should show statistical significance* The directional pre* 
dictions should also hold for test concept scores and half-test concept 
scores; however, the actual directional comparisons may not be Individually 
statistically significant using test concept and half*test concept scores* 

All of these predictions were directional and asstrnted a comparison 
between groups of students who received effective training on the concepts 
through the use of the Concepts and Patterns film series and either (1*) 
the same groups ^,n a pre^treatment state or (2*) an Independent pre^ 
treatment group* An additional set of predictions was stated Involving 
comparisons b<-tween trained groups* In this case, t£ no conslsten': dif- 
ferences were found, additional evidence of test reliability and validity 
would be demonstrated* 
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Validation Studies 

Instruction al treatment . The Concepts and Patterns protocol fllins 
were used as part of regular claaaroom Instruction In several Intact sec- 
tions of a graduate level course In educational psychology* The students 
enrolled In these sections were representative of those normally enrolled 
In the course; heterogeneous In age, experience and ability* llie topics 
of teacher behavior and teacher^pupll Interaction were a conventional part 
of the course* Indicated previously, all students may well have gained 
some general familiarity with one or more of the concepts from Instruction 
In previous courses* All students In the first validation study were 
familiarized also with the specific concept definitions during the five- 
minute orientation prior to administration of the test* Hius, pretest 
performance In the first study was potentially Influenced by both unplanned 
and planned prior learning; this Influence probably was reflected in 
average pretest scores that were well above chance level* In the second 
study, the pretest orientation to thj concept definitions was oodtted; as 
a result, average pretest scores were closer to chance level* 

Scorlnp procedure * In the case of the first validation study, re- 
sponses were scored as correct (that Is, an accurate categorization of th^ 
teacher's behavior); partially correct/partially Incorrect (that is, an 
incomplete categorization of the teacher's behavior); or incorrect (' 
Inaccurate categorization of the teacher's behavior)* Two points were 
allowed for each correct response and one point for each partially correct/ 
partially Incorrect response* k perfect score was 240 points and a chanc - 
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level score was 120 points. A perusal of examinee^ answers indicated^ 
however^ that the partially correct /partially incorrect response was seldoni 
used. As a result* this option was discontinued £ot the second validation 
study. In that study^ all items were scored simply as correct or incorrect; 
thu9^ a perfect score 120 points. 

Data Analysis . Designs were constructed Which allowed for the statld* 
tical test of directional as well as nondirectional comparisons using die 
total composite score and odd/even split^half composite scores. For the 
directional comparisons » it was predicted that means wc^ld consistently 
increase and variances would consistently decrease from pre-instruction 
to post*in3truction conditions. Further^ concept scores would follow a 
similar directional pattern. The analysis is reported separately for each 
of the two studies. 

First study design and results . TUxe design used for the first study 
was a variation o£ the separate*sample pretest^posttest design with ranOom 
assignment^ within course sections^ into two groups. l!his design was ap- 
propriate to the field setting and had superior external validity character- 
istics* the effects of pretesting and interaction of testing wldi treatment 
were controlled. TSxe design is depicted as follows: 

Oa X 

X Op 

From this design^ two sets of directional contrasts of means were prridicted^ 
B > A and C > A. Conversely^ the directionality of contrasts of variances 
were predicted to be B < A and C < A. A nondirectional contrast^ B ^ 
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was also tested, k auinmary of the descriptive statistics Is reported la 
Table 3. 



Insert Table 3 about here 

For the sets of directional contrasts of nteana and variances, the 
predicted trends using odd/even spllt-^half coinposlte scores and total com^ 
poslte scores were found. The B < A contrasts, ^Ich Involved correlated 
data, showed significant differences In variance, [t (39) » 2.26 to 4.11, 
p < .05]. After using logarithmic transformations to obtain homogeneous 
variances, the B > A contrasts of mean differences were significant, 
[F (1,40) « 29.28 to 51.26, p < .001]. The C < A contrasts of differences 
In variance [F, (40,47) » 1.87 to 2.70, p < .053 and C > A contrasts of 
differences In means [t (64 to 72) 2.17 to 2.64^ p < ,05] were signifi- 
cant. Since these contrasts Involved Independent data^ t-tests using 
separate sample estimates of the population variance were used to test 
mean differences. 

The nondlrectlonal contrasts, B ^ C, resulted In nonsignificant 
[F (47,49) = 1.16 to 1.62, p > .10] variance differences but mean differences 
were consistently significant [t^ (87) « 2.91 to 3.97, p < .01] for odtJ/even 
spllt-half composite scores and total coc&poslte scores. For each contr:^st 
of means^ B was greater than G, a difference that was attributed to t\i^ 
pretesting experience of the B Ss. 

In addition^ concept scores and odd/ even spllt^half concept scores 
were generated and the directionality of mean and variance differences Vero 
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compiled* Dies^ results are reported in Table 4« 



Insert Table 4 about here 

As reported in Tables 4 and 5, all tha B * A contrasts of concept 
scores and odd/even split^hal£ concept scores were in the predicted direc-^ 
tion* That ia, all variances were lower after instruction and all means 
were higher after instruction* Vith six concept scores and twelve dplit^ 
half concept scores £or both means and variances, 36 of 36 a priori 
predicted changes were found* For the C * A contrasts, 10 of 12 concept 
score predictions and 18 of 24 split^half concept score predictiond were 
supported* 

Insert T^ble 5 about here 

Secon d study design and results* l!he final filmed version of the 
test wa«t used in a graduate level educational psychology course following 
a pretest and posttest design* While it is acknowledged that the effect£i 
o£ pretesting and interaction of testing with treatment were not controllk^d, 
the absence of changes in means and variances would raise serious qucction^ 
about the test and/or the effectiveness of the films* For this reason, 
the results are reported as evidence for the characteristics of the filmed 
version of the test* 

llie examinees who took the test as a pretest were simply intiroducerl 
to the concept names as part of the test; no specific orientation on the 
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concepts was providedt as had been done In the first study. It mC'^ expected 
that the means would Increase and the variances would decrease from prete^^ 
to posttest. A sinnmary of the descriptive statistics Is provided In 
Table 6. 



Insert Table 6 about here 

The predicted trends were found using odd/even spllt^half composite 
scores and total composite scores. The correlated variances oa. the post* 
test scores were significantly lower than the pretest scores [t (17) = 3.V6 
to 5.96» p < .01]. The means of the posttest scores were significantly 
higher than the pretest scores [f (1.18) » 22.58 to 35.17. p< .OOl]. 

The concept scores and odd/even spllt*h&lf concept scores of the pre- 
test were cmpared with the same scores on the posttest. The predicted 
dlrectlonalltles of means and variances were found and are reported In 
Table 7. The probability of obtaining each number of piredlcted differences 
by chance alone Is reported In Table 8. 



Insert Table 7 about here 

From Tables 7 and 8, It can be seen that all the B * A contrasts of 
total concept scores and odd/even split-half concept scores were In the 
predicted direction. The means froca the posttest were higher than the meane 
from the pretest and the variances from the posttest were lower than the 
variances from the pretest. Although several alternative explanations might 
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be offered V the test consistently gave Indications of being sensitive to 
Instruction through the use of the Concept and Pattern films* 



Insert Table 8 about here 

The following conclusion regarding the test characteristics of 
Categorlzlnft Teacher Behavior seems warranted: by virtue of the proportion 
of predictions that were aubstantlated» the tedt evidenced a high degro^ 
of reliability (consistency) and validity. Directional predictions were 
confirmed using composite test scores snd composite half^test scores in 
most instances* There clearly was a discernible and significant gain in 
mean test performance which can be accounted for by instruction through 
the use of the protocol films* Further positive evidence on the charac^ 
teristlcs of the test was found using full test concept scores and half^ 
test concept scores* The over^elming majority of these comparisons 
substantiated a priori predictions* 
Response Bias 

As a supplementary analysis to the strategy of assessing the technical 
characteristics of this test» response bias was investigated* The test 
was constructed with the realization that, even without knowledgr^ of tWfi 
concepts, examinees could conceivably score higher than chance leve^ ^y 
biasing their responaes; that is, one could try to "seccnd guess" the tsst* 
Ibis possibility is due primarily to the fact that Instancing of the ccac^ptn 
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was dctte with a view to approxtmatln/^ a realistic classroom setting rather 
than constructing an Ideal correct response distribution* 

It Is quite reasonable to conclude that longer cla&aroom episodes 
might evidence a greater nui]d)er of concepts* In fact» even though all 
episodes are less than one minute In lengthy the longer episodes do tend 
to Illustrate more concepts^ Diere was a zero*order correlation coefficient 
of ,37 (df « 28, p < ,05) between the length of the episode (measured In 
seconds) and the ntnnber of concepts instanced* Since this relationship 
was founds the tendency for an examinee who has received concept training 
to respotvl ''yes'* in the longer episodes was investigated by computing tlie 
average ntjmber of 'V^s*' responses for each episode In the first study* 
Further, flrst-order partial correlation coefficients were cctoputed between 
the length of the episode (in seconda) and the average number of 'V^s*' 
responses per episode with the correct "y^^" responses partlalled out* 
The first order partial correlation of ,11 (t(27) « .55, p > *05) was In- 
significant* This finding Indicated that a ^^yes*^ response bias In the case 
of longer episodes was unlikely* It was concluded that "score inflaticn" 
by such response bdiavior was not present in that study* 

Another possibility for examinees in biasing responses was to respond 
to each item with "no*" A person who used this response pattern could score 
somewhat higher than chance level (fifty percent correct) since concepts 
are more often absent than present* It was felt that the general and often 
demonstrated tendency to respond "yes** when In doubt made the possibility 
of this type of bias negligible* 
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Effects og Protocol Based Inatructlo^ 
Having gathered supporting evidence on Its validity as an evaluation 
Inatrment, Categorising Teacher Behavior was next used to asseea the re- 
sults of protocol based instruction in terms of concept acquisition* In 
a real sense^ of course^ the same type of evidence (data on change in con* 
cept acquisition attributable to training) is used to present Information 
both on test validity and Inatructlonal effects. However, In this; section, 
the data will be viewed from the perspective it provides on training effects 
for all concepts rather than from the perspective of the technical qualities 
of the test* 
Instructional Treatment 

Selected films from the Concepts aijd Patterns series were used in 
instruction with a graduate level class In educational pdycholo^* The 
class was composed of both experienced and inexperienced teachers re- 
presenting a wide variety of teaching or administrative levels ^nd areas 
in professional education* The class mat for ono evening session of ap- 
proximately two and one half hours each week* The Instructor, one of the 
authors of this paper, consistently takes a '^teaching skill or behavior^' 
approach to this course; thus, the content of this film series fits naturally 
into the context of the course* 

All three Concept films and a specifically selected four of the si^ 
Pattern films were used in instruction* The Concept films were used to 
present the concepts; analysis of the behavior portrayed in these films 
and in the Pattern films was conducted througj^i instructor* led clars 
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discussion. A few brle£ related readings were assigned fis collateral 
work. The class was advised that performance on a posttest would be used 
to determine letter grades for the unit. A total of approximately eight 
hours of classroom time over a four week period was devoted to this In" 
structlonal treatment (Including both pretesting and posttestlng) . 
Deslgfi asd Results 

As In the first validation study, a variation of the separate sa^le 
pretest and posttest design was used. As shown In the diagram below, a 
random half of the total group was pretested with Categorlslnfi Teachar 
Behavior (0^)* All students wre posttested, following Instructional 
treatment^ with Categorizing Teacher Behavior . The results are reported 
separately^ however, for those students who had taken the test as a pretest 
(Og) and those students who had not been pretested (0^). It Is Important 
to note that In neither the pretest nor die postteat did studettte have 
access, either Immediately before or during testing, to concept deflnltlono. 

As In the case of the teat validation studies, directional contrasts oj: 
means (B > A and C > A) and of variances (B < A and C < A) were predicted. 
A nondlrectlonal contrast, B C, was again tested. 

Means and standard deviations for concept and composite scores are 
reported In table 9 for the pretest (A) and the two posttest groups 
(B and C). As depicted In the design, the posttest Information reported 
under B Is for the random half of the class who were pretested, llie 
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po8tte>c information under C is for the random half of the class who were 
not pretested. 



Insert Table 9 about here 

A comparidDn of the B means to the A means using a repeated measures 
analysis of variance revealed a significant [F (1,22) » 4.47 to 61.49, 
P .01 to .05] difference for all concept scores and for the composite 
score* For the pretested group there was a significant gain on all concepts* 

Using a dimple analysis of variance, the C means were contrasted uith 
die A means. A statistically significant [F <1,51) * 9.06 to 34.06, p < .01] 
difference was found for the composite score and the probing, productive 
questioning, and reproductive questioning scores. For all concept scores, 
the C means were greater than the A means; however, the difference did not 
reach statistical significance for informing, approving and disapproving 
concept scores. In addition, the B means were contrasted to the C means 
and no difference was statistically significant for the two posttest gro^Jtps^ 

The B < A comparisons of variances were tested using a t^test (for 
differences between dependent samples). With the exception ot probing, 
the directionality for all concept scores was in the predicted direction-* 
but none reached statistical significance (p < .05). The C < A comparisons 
of variances yielded similar results. With the exception of probing l11 
differences were in the predicted direction but failed to reach statistical 
significance. 
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There Is evidence that the use of the Concept and Pattern films In 
this Instructional setting had a significant effect upon concept acquisition 
as measured by the fllm-based test. The evidence suggests that there was 
a gain on all concepts from a time period prior to Instruction to a time 
period after Instruction. This gain vas reflected In the differences In 
means. The lack of a significant decrease In the variances, however, sug*- 
gesta that there was not a convergence of the Instructional group toward 
a criterion. While It Is clearly evident that several students reached 
a high level of concept acquisition, the absence of a decrease in varlancAO 
indicates that other students did not reach a similarly high level. 

It is quite likely that the failure to demonstrate a decrease In 
variance is attributable to certain limitations of the specific course 
conditions as a training setting* k protracted general training period, 
week long Intervals between Instructional sessions, and a large class en** 
rollment, for example, are all Inimical to the kind of Intensive Instruction 
that is probably Involved In teaching towards a mastery criterion. The 
adverse effect of such conditions Is probably more pronounced for some 
trainees than for others. Some degree of irregularity in attendance over 
the general training period probably also helps to account for the absence 
of a decrease in variance. 

Substantiating evidence for these hypotheses is in fact provided by 
the data reported earlier for the second validation study* In this case, 
there were significant decreases in variance as well as sigoificaat 
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Increases la means. Instruction la this group was characterized by shorter 
lostnictlonal sessloas occurring on a dally basis aver a shorter general 
training perlod« Furthermore, class size was less tfaaa half that of the 
group described la this section. 

la aay eveat, it should be remembered that this protocol film series 
was explicitly designed for use In typically varied lastructloaal settings. 
TShe fact that use of the oerles la two somevhat contrast£:ig Instructional 
settings resulted In consistent average gains In concept acquisition la 
an Indication of the practical effectiveness of tho ser'.es. It Is clear, 
however, that some course arrangements can be Identified that more closely 
approximate an effective training condition (as evidence by their "p^^^i^" 
to move a group o£ trainees closer to a mastery level criterion). 

Furthe r Research 

Bavlng evidence available on (1.) the validity of CategorizlnR Xeacher 
Behavior as a test and (2.) the effectiveness of the Concepts and patterns 
film series as measured by that test opens the way for further Investiga- 
tion of the relationship between acquisition of this specific set of con- 
cepts and the teaching bdiavlors to vhlch they refer. As suggested in tlie 
introduction to this paper, it is quite plausible to hypothesize that 
training In concepts referring to teaching behaviors might well result In 
acquisition of these bdiavlors themselves* As long as one accepts Ind* 
dence of occurrence of behaviors as a performance criterion, conslder^?bla 
indirect evidence and at least one body of direct evidence (Kleucker, X975) 
suggests that use of the Concepts an<3j Patterns series can lead to behavioral 
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as well as conceptual change. Specifically^ the authors of the present 
paper are presently Investigating the relationship between level of concept 
acquisition^ as evidenced by performance on Categorlalng Teacher Bdiavlor 
and incidence of use of the related skills In a simulated teaching setting* 
Briefly^ the authors hypothesise that acquisition of the specific concepts 
measured by this flXm**based test Is positively related to Incidence of 
using the related skills In Instruction. It may be that the effect of 
concept training on teaching performance has not been estimated accurately 
In some past studies because of failure to use a test of demonstrated and 
appropriate validity to measure concept acquisition* 
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Footnotes 

^ The criterion for randomization was not fully met because of absences > 
late arrivals > and other course administrative problems. 
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Table I 



HUMBER OF INSTANCES OF CONCEPTS 
BY PART OF TEST 



Part of Test 

II III TOTAL 



Concept 



Probing 


4 




4 


8 


Inforning 


4 




S 


7 


Approving 


6 


6 




12 


Disapproving 


3 


6 




9 


Productive Queationlng 




5 


5 


io 


Reproductive Questioning 




3 


3 


6 


TOTAL 


17 


20 


15 


52 
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Table 2 

NUMBER OF EPISODES WITH A GIVEN NUMBER OF CONCEPTS 



Number oi Concepts 
Per Kplsode 


Number of Episodes 






Part I 


Part U 


Part III 


Total 


0 


0 


0 


0 


0 


1 


4 


3 


6 


13 


2 


5 


4 


3 


12 


3 


1 


3 


1 


5 


4 


0 


0 


0 


0 


TOTAL 


10 


10 


10 


30 
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XsbU 3 

MBAKS AHD STANDARD DSVIATIOHS FOR CGHFOSITE SCORSS 



Costpotlte 

Slow -A B C 

a S «d n X sd n X 8d 



Odd 41 94.2 U.5 41 104.2 6.5 48 . 98.7 7.0 

Even 41 99.2 10.2 41 108.5 5.9 48 104.3 7.5 

Tot«l 41 193.3 20 . 2 41 2X2 . 7 10 . 1 48 20r . 0 12.3 
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Table 4 

DESCXIPHVB SWMIRX OF CONCEPT SCORES 



Concept Score* Hmbber 

of 


A 

(n » 

X 


41) 

• Q 


(n - 

V 

A 


41) 

SCL 


c 

(n - 

X 


48) 

9Q 






















30.4 


3.6 


35.2 


2.8 


34.3 


3.6* 




20 


30.5 


5.1 


34.7 


2.4 


33.5 


3.5 




20 


32.3 


4.0 


33.6 


2.7 


32.0* 


3.5 


disapproving 


20 


33.5 


4.6 


36.1 


3.0 


33.6 


3.2 




20 


33.3 


6.4 


36.3 


2.8 


34.4 


3.1 


R&productiV« Questlonlt3[$ 


20 


33.3 


6.1 


36.8 


2.8 


35.1 


3.3 


Odd 
















flroblng 


10 


T.5.6 


2.3 


18.6 


1.8 


17.8 




InformlTtg 


10 


•14.2 


3.0 


16.3 


2.0 


15.9 


2.2 


Approving 


10 


14.8 


2.0 


15.4 


1.5 


14.4* 


2.3* 


Disapproving 


10 


16.9 


2.5 


18.1 


1.9 


17.3 


1.8 


, Productive Questioning 


10 


16.9 


4.1 


18.0 


1.9 


16.7* 


2.2 


Rjftproductive Questioning 


10 


15.8 


4.1 


17.8 


' 2.3 


16.6 


2.4 


. Even 
















Froblng 


10 


14.8 


2.0 


16.6 


1.7 


16.5 


2.2* 


InConslng 


10 ■ 


16.3 


2.8 


18.4 


1.7 


17.6 


2.3 


. Approving 


10 


17.5 


2.6 


* 18.2 


2.1 


17.6 


2.0 


Disapproving 


10 


16.5 


2.6 


18.0 


1.9 


16.3* 


2.4 


Productive Questioning 


10 


16.5 


3.0 


18.3 


1.9 


17.7 


2.2 


Reproductive Questioning 


10 


17.5 


2.3 


19.0 


1.4 


18.5 


l.fi 


^ Reversal of prediction relative to A 
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Table 5 

COMPARISONS OF* TOTAL CONCEPT SCORES Al© CDU/EVEH CONCEPT SCORES 



Total Concept Scores 



J5^/|!iSB Concept Scores 



Mesns 
ads 



Mean) 



6 
6 



5 
5 



.02 
.02 



.H 
.11 



B>A CoiRparlsons 



C-A CoiTiparison$ 



12 
12 



9 
9 



.0002 
.COO? 



.07 
.07 
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TaMe 6 

MEANS AMD STANDAKD DEVUHONS FOR COMPOSITE SCORES 
Compotite A (Pre) B (Pott) 





tt 


X 


•d 


a 




td 




19 


42.5 


8.2 


19 


51.5 


4.2 


Even 


19 


41.0 


11.9 


19 


51.3 


4.8 


Total . 


19 


83.5 


19,4 


19 


102.8 


a.i 
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Table 7 

DBSCRimVB SintlWtY OF COHCECT SCORES 



* 




A (Pre) 


B (F( 


)St) 


Concept Scores 




(a » 


19) 


(n • 


19) 




Nmriber 












of Items 






X 


sd' 


total 












Probing 


20 


13.3 


2.7 


17.9 


1.9 


Informing 


20 


13.4 


3.9 


16.7 


1.6 


Approving 


20 


14.6 


3.3 . 


15.8 


2.2 


Disapproving 


20 


15.9 


4 2 


17.6 


1.4 


Productlva Questioning - 




U.5 


3.9 


17.9 


2.2 


Reproductive Questioning 


20 


ll.8 


5.5 


17.1 


3.7 


Odd 












Probing 


10 


6.3 


1.5 


9.3 


1.0 


Inforulng 


10 


6.2 


2.1 


7.5 


1.0 


Approving 


10 


7.2 


1.5 


7.6 


1.3 


Disapproving 


10 


8.1 


2.3 


9.1 


0.6 


Productive Questioning 


10 


7.5 


1.7 


8.8 


1.2 


Reproductive Ytuestlonlng 


10 


4.8 


3.6 


8.2 


2.4 


Even 












Probing 


10 


7.0 


1.7 


8.6 


1.2 


Informing 


10 


7.3 


2.1 


9.2 


1.0 


Approving 


10 


7.5 


2.3 


8.2 


1.3 


Disapproving 


10 


7.7 


2.1 


8.5 


1.0 


Productive Questioning 


10 ■ 


7.0 


2.5 


8.9 


1.6 


Reproductive Questlonlug 


10 


7.0 


2.3 


8.8 


1.5 



32 
34 



Table 8 

COMPARISONS OF TOTAL CONCEPT SCORES AND OOD/EVBN CONCEPT SCORES 



Total Concept Scores Od d/ Evert Concept Scores 

% Ph Pr 

Means 6 .02 12 .0002 

sd6 6 .02 12 .0002 
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table 9 

DBSCRimVE SUMKART OF CONCEPT SCORES 

AND CCHPOSIIE SCORE 



Number B C 

Score ItLa '^'^ <^'23). (n-30) 







X 


sd 


X 


sd 


X 


«d 


Probing 


20 


14.0 


1.8 


17.9 


2.2* 


17.5 


2.5* 


In£oxiiiinft 


20 


: 15. J 


2.7 


17.0 


1.5 


16.6 


2.7 


Approving 


20 ■ 


15,2 


2,3 


16.6 


2.1 


16.0 


1.5 


Disapproving 


20 


16. 3 


2.7 


17.1" 


2.2 


16.7 


1.9 


Productive Questioning 


20 


15.1 


3.2 


17.4 


1.7 


17.4 


2.2 


Reproductive Questioning 


20 


13.6 


3.8 


17.7 


2.0 


17.1 


2.8 


Composite 


120 


89.5 


10.8 


t03.7 


a o 


101.2 


10.1 ■ 



* Reversal of predicted direction. 
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